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ABSTRACT 

A study was conducted (1) to obtain an estimate of 
the amount of error in prediction that occurs when static measures 
are used to interpret work-sample .performance ; (2) to identify a 
number of dynamic performance measures that reflect learning that 
might occur on the task; and (3) to compare the predictive accuracy 
of those measures to that of static measures. The participants in the 
study were 10 male and 10 female handicapped vocational evaluation 
clients. The clients practiced on a work sample involving a 
relatively simple psychomotor task for 5 consecutive work days (50 
trials/day). The data collected showed that the participants improved 
an average of 31 percent in performance speed over the 5 days, with 
11 exceeding the industrial standard by day 5. The accuracy of eight 
methods of predicting day 5 performance was investigated; all of the 
dynamic prediction methods proved superior to the traditional static 
wd,rk-sample measure. It was concluded that the use of day 1 total 
response-time scores seriously underestimates the level of 
performance that a handicapped individual can potentially achieve on 
a task following practice, while analyzing the performance data with 
dynamic performance measures can result in significantly more 
accurate estimates of the level of performance that someone can 
achieve. It was suggested that microcomputers be used in data 
collection and analysis of dynamic performance measures. (KC) 
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ABSTRACT 



Vocational evaluators frequently use work samples to 
assess the vocational potential of handicapped individuals. 
One of the purposes 'of such assessment is the prediction of 
specific jobs or job areas where the client would have the 
greatest likelihood of vocational success. In making those 
predictions, evaluators often rely upon what can be charac- 
terized as static performance measures, such as the mean or 
total time to complete a task. Those measures are static in 
that they do not reflect any performance changes that might 
be occurring during testing. For instance, they fail to 
reflect any improvement in performance that is occurring due 
to learning. The present study was conducted in order to 
obtain an estimate of the amount of error in prediction that 
occurs when static measures are used to interpret work-sample 
performance. A second purpose was to identify a number of 
dynamic performance measures (those which reflect any learn- 
i ng that mi ght occur on the task ) and to compare the 
predictive accuracy of those measures to that of static meas- 
ures. The third purpose was to use these data as the> basis 
for development of a software package on learning-curve anal- 
ysis for use on microcomputers. Together, these purposes are 
directed toward the goal of making learning-curve analysis an 
easily adoptable and valued tool within vocational evalua- 
tion. 



The participants in this study were 20 handicapped voca- 
tional evaluation clients, 10 males and 10 females. Those 
individuals practiced on a work sample involving a relatively 
simple psychomotor task for five consecutive work days (50 
trials/day). The latency of each response was automatically 
recorded by a microcomputer and used in the data analyses. 

The results of data analyses indicated that the partici- 
pants improved an average of 30.68% in performance speed over 
the five days of practice. Only 1 individual exceeded the 
industrial standard on the task by Day 1 but 11 did so on Day 
5. These findings suggest that using a static measure of 
Day-1 performance, such as the mean or total time for the 
session, seriously underestimates the level of performance 
the individual could attain if given practice at the task 
because it does not reflect the learning that would occur. 
No differences in performance were found between^ males and 
females on this task. 

The accuracy of eight methods of predicting Day-5 per- 
formance was investigated. The data from either Day 1 alone 
or from Days 1-4 were used in these analyses. Three meas- 
ures of predictive accuracy were employed: the degree of 
correlation between predicted and obtained Day-5 scores, the 
number of classification errors (incorrectly predicting some- 
one to be above or below standard on Day 5) obtained with 
each prediction method, and the percentage of error in pre- 

iii 



dieting the Day-5 scores. The total time taken to complete 
the Day-1 trials was used as the standard for comparison in 
these analyses since this measure appears to be the one which 
is typically used by vocational evaluators when assessing an 
individual' s performance. 

All of the dynamic prediction methods proved superior to 
the traditional static work-sample measure. It was found 
with all three accuracy measures that predictions derived by 
fitting the data from Days 1 - 4 to any of six different 
learning curves produced estimates that v/ere significantly 
more accurate than were obtained using the Day-1 total 
scores. Those six learning-curve equations, however, did not 
differ in accuracy on any of the measures. Fitting data from 
the first 50 trials (Day 1) to learning curves also produced 
significantly smaller percentage-of -error scores in predict- 
ing Day-5 performance than did the use of the Day-1 total 
scores as predictors. Interestingly, a relatively simple 
method — using the mean of the fastest 20% of the 50 trials 
performed on Day 1 (the "best 20% method") — was as accurate 
in predicting Day-5 performance as any of the learning curves 
which used data from Days 1-4. 

It was concluded that the use of Day-1 total response- 
time scores seriously underestimates the level of performance 
that a handicapped individual can potentially achieve on a 
task following practice. Underestimating the performance 
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capacity of a person cou j lead to the erroneous conclusion 
that the individual is incapable of performing a job well 
enough to pursue it as an occupation. The results also sug- 
gested that analyzing the performance data with dynamic per- 
formance measures (i.e., learning curves, best 20%, etc.)' can 
result in significantly more accurate estimates of the^ level 
of performance that someone can achieve. Further research is 
needed, however, to determine which of several dynamic per- 
formance measures is most accurate over longer prediction 
intervals and with different work tasks. It was suggested 
that the use of microcomputers for the purposes of collecting 
and analyzing performance data could lead to the widespread 
adoption of dynamic performance measures by vocational evalu- 
ators. This could result in an increase in the accuracy of 
the predictions that are made about client job potential. 
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I. INTRODUCTION 
One of the tools which vocational evaluators have tradi- 

f 

tionally used to determine the vocational potential of handi- 
capped persons is the simulated job or work sample, Neff 
(1966). described this approach as an "effort to capitalize on 
the virtues of both psychometric testing and job analysis, 
while trying to avoid the limitations of the older 
approaches," Work samples are structured assessment situa- 
tions that replicate the actual tasks involved in a -particu- 
lar job. For example, a work sample designee^ to measure 
someone's capacity to function as a lathe operator would 
require the individual to operate a lathe under observation. 
Proponents of the work-sample approach cite the simplicity in 
interpreting the results as a strong point. If the individ- 
ual demonstrates competency on a" work sample such as a lathe 
and the worker wants to be a lathe operator, then, it is log- 
ical to recommend placement in that vocation. But the proc- 
ess of vocational evaluation and the role of work samples 
within that process is not always so simple. 

Interpretation of work-sample performance becomes" more 
susceptible to error when performance scores are ^l^mjed and 
are used for prediction purposes. Usually a client's per- 
formance fis measured during a one-time administration of a 
work sample. The obtained performance measure, such as the 
total time to complete the work task, is. then compared to 
norms for competitively employed indjviduals or other stan- 



dards-. Typically, the evaluatbr would then convert the score 
to a percentile rank or -some equivalent measure. The latter 
would then' be used as aft aid in making decisions about the 
likelihood' of success Jf the client were trained and/or 
placed in that job or occupational area. 

K 

V 

As McCray (1979) has suggested, the above approach prob- 
ably leads to reasonably accurate decisions in those cases 
where the client performs at the 80th percentile or higher 
when compared to a competitively-employed norm group • Such 
clients probably will be successful at the job. Similarly, 
when the client scores at the 20th percentile or lower, it is 
probable that this approach usually achieves the desired 
accuracy when classifying the client as unlikely to reach the 
level** of the industrial norm on the job. This approach can 
become problematic, however, wh£n individuals perform in the 
intervening percentiles (the 21st to 79th). In those cases, 
the predictions that are made about the clients in the task 
area have questionable validity and can lead to possibly dam- 
'aginq service decisions and recommendations. The results of 
such erroneous decisions would be the Exclusion of capable 
individuals (false-negatives) or the inclusion of those who 
could not be successful (false-positives) at the task. In. 
either case, the client would lose, having missed an opportu- 
nity for success or having, experienced failure, perhaps 
unnecessarily, 
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Prediction is always associated with some degree of 
error. The current methodology of work-sample testing in 
Vocational Evaluation contributes, perhaps needlessly, to 
ttvft error. The problem is that static measures (i.e., mean 
time, total time, or the number of pieces produced during ^ 
given time on one day of evaluation) are u^ed to predict 
behafior.. Further, even the variability within \that ..fixed 
period is ignored because an average measure of peVformance 
is used (e.g., mean time). The use of such measures is based 
upon the assumption that performance on tasks is relatively 
stable or uncharging. Yet there are results" from literally 
hundreds of studies which demonstrate that people improve 
with practice at virtually any task, particularly those 
involving the learning of psychomotor .skills (e.g., Bilodeau 
& Bilodeau, 1961; Newell & Rosenb'oom, 1981). Those studies 
show that behavior is not static but dynamic and that change 
in performance wifelv practice is to be expected with most 
tasks. 

Another source of concern with the traditional 
work-sample approach is that it often compares the perform- 
ance of inexperienced individuals to norms for experienced 
individuals. It seems that increasing numbers of individuals 
in vocational evaluation have had little or no experience in 
the areas that wori< samples are designed to measure. In 
addition, many individuals, experienced or inexperienced, do 
not perform well under test conditions. Both of those fac- 
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tors tend to lead to the underestimation of the actual level 
of skill the individual could attain following .practice on 
the task. For instance, Dunn (1976), who analyzed data col- 
lected by Botterbusch (1974) on college students who per- 
formed on the Stout U-bolt Assembly, found that only 15% of 
the males and 6% of the females met the industrial standard 
for the task on the first administration of the work sample. 
Chyatte (1976) found that handicapped individuals were even 
less likely to reach the level of the industrial standard 
when first exposed to a number'- of commercial work sample 
tasks. Dunn's further analyses revealed that 55% of the 
males and 42% of the females in his study did meet the indus- 
trial standard by the end of their fourth practice session. 
Thus, these data support the concern about the validity and 
utility of single, static performance scores for prediction 
of vocational potential. Also, it seems clear that some 
individuals cannot perform at the level of industrial stan- 
dards initially, but could do so with practice. 

These concerns have been raised before (Dunn, 1976) and 
techniques have been offered to increase the accuracy of pre- 
dictions that are made using work-sample performance scores. 
A number of people have suggested using some measure of the 
rate at which the individual is learning the task. Such an 
approach would take advantage of the well established finding 
of a curvilinear relationship between performance level and 
the'amount of practice an individual has had. The rate of 



improvement on a task is typically very high initially, but 
becomes smaller as the amount of practice increases. This 
relationship when graphically depicted is referred to as a 
learning curve and reflects the dynamic nature of performance 
changes with practice. 

Tillman (1971), who was one of the first to advocate a 
learning-curve approach, suggested that clients be allowed to 
practice on, work samples until their performance no longer 
showed improvement. The final level of performance could 
then be used to determine the suitability of training or 
placing the clfent in the job represented by the work sample. 
Thus, this procedure would allow the people to learn as much 
as they were capable of prior to making a decision about 
their capacity at the task. A number of people (e.g., Dunn, 
1976) have rejected this approach as impractical, however, 
since performance continues to improve for many thousands of 
repetitions with some tasks. For instance, Crossman (1959) 
found that cigar makers continued to show significant 
improvements in performance for up to four years (over a mil- 
lion task repetitions). 

Dunn (1976) suggested that the use of individualized 
prediction equations (learning curves) could lead to more 
accurate estimates of client potential on a task. This 
approach would involve analyzing data from a relatively small 
number of performance trials with learning-curve formulas. 



The parameter values obtained from those analyses would then 
be used to make predictions (extrapolations) about the level 
of performance the client could eventually achieve if given 
ample practice. Dunn tested his assumption using the data 
collected by Botterbusch (1974). The total performance 
scores from the first three days of practice were used to 
predict the performance level on Day 4, the final session. 
Dunn found that the final performance level could be pre- 
dicted with less than 1% error, on the average. 

Dunn's (1976) findings clearly suggest that learning 
curves could potentially be used to obtain highly accurate 
estimates of the level of performance someone can achieve 
following practice. There are a number of questions which 
remain to be answered, however, concerning the practicality 
and accuracy of using learning curves for predictive pur- 
poses. For instance, the learning curve approach is often 
recommended and used by professionals and practitioners, but 
not for long. The method soon becomes too time consuming. 
Accurate recording is essential, the data must be analyzed 
and interpreted for each individual on each task, and 
alterations may have to be made to existing work-sample pro- 
cedures. In addition, evaluators often have difficulty 
interpreting other than "textbook" learning curves and have 
concerns about the relationship of such curves to existing 
norms. , 
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It is also unclear which learning-curve formula would 
result in the most accurate predictions. A wide number of 
formulas have been used to describe the performance changes 
that occur with learning, including the one used by Dunn. 
Three recent articles have made the argument th it hyperbolic 
((Y =K(X+C)/(X+C+R)) : Mazur & Hastie, 1978), modified expo- 
nential (Y = A + BC X : Noble, 1978), and power (geometric) 
functions (Y = AX?: Newell & Rosenbloom, 1981) provide the 
best description of the learning that occurs on a wide vari- 
ety of both cognitive and psychomotor tasks. It is clear, on 
the basis of £he findings of those three studies, that the 
formulas examined there provide reasonably accurate descrip- 
tions of learning data. However, it is not clear which for- 
mula provides the best description of learning and which 
could provide the most accurate estimates of practiced per- 
formance levels because the appropriate comparisons were not 
made in th6se' studies. 

Present Research . 

This study is part of an effort by the Research and 
Training Center to enhance the utilization of dynamic meas- 
ures of vocational potential - those which incorporate indi- 
ces of change with practice - and reduce the amount of error 
in predictions, recommendations, and decisions about voca- 
tional potential. The purposes of the present study were to 
evaluate the extent of change that occurs in motor behavior 
in relatively brief periods, estimate the amount of error in 
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prediction, and compare learning-curve equations. The second 
phase of the Center's research is the development of computer 
software (programs), interface equipment, and techniques for 
utilization of microprocessors to allow learning-curve analy- 
ses to become an easily adoptable and valued 
vocational-evaluation tool. The specific objectives of this 
study were: 

1. To estimate the amount of error associated with a 
static vocational-evaluation method, 

2. To identify prediction methods which generate prac- 
ticed performance levels based upon initial perform- 
ance levels, 

3, To determine which prediction method(s) is the most 
accurate and practical. 

4, To use these findings to develop a microprocessor 
based learning/performance analysis system. 

The participants in this study were 20 handicapped 
vocational -evaluation clients. These individuals practiced 
on a work sample for part of each day for five consecutive 
work days (50 trials/day). The latency of each trial was 
automatically recorded by a microcomputer and was used in the 
data analyses. In those analyses, the data from the early 
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practice sessions (Day 1 alone or Days 1-4) were used to 
predict the level of performance the clients reached during 
the final practice session (Day 5). 

Eight prediction methods were examined in the study. 
The first method consisted of the total -time score for the 
first pr.actice session. This static performance measure was 
included because it appears to be the traditional method used 
by evaluators. Six of the prediction methods consisted of 
learning-curve equations. In addition to the hyperbolic, 
modified exponential, and geometric formulas described above, 
three other equations were examined. These included an expo- 
nential equation (Y = AB*: Spiegel, 1961), a two parameter 
hyperbolic equation (Y = A/X + B: Lippert, 1976), and a 
log-linear equation (Y = A + (B x log X): Dunn, 1976). All 
of the equations were chosen because they have previously 
been used to represent learning data. 

The accuracy of what has been named the "best-20# 
method" was also evaluated. This method was developed by the 
present authors who sought to find a practical yet accurate 
prediction technique that was based on the well -demonstrated 
fact that performance improves with practice. This method 
consisted of using the mean of the fastest 20% of the trials 
during the first practice session as the estimate of the 
individual's final performance level. It was assumed that*** 
the client would eventually improve with practice to the 



point' where his or her average response would equal the mean 
of the fastest x percentage of trials during the Initial ses- 
sion. The value of 20% was chosen because it seemed to be 
reasonable after examining the data. 
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II. METHOD 



Subjects , 

The subjects in this study were 20 handicapped adults 
(10 male, 10 female) who were undergoing vocational evalua- 
tion at the Vocational Development Center (VDC) at the Uni- 
versity of Wfsconsin-Stout. These individuals represented a 

» 

variety of handicaps, though none were severly handicapped. 
They were randomly selected with the restriction that they 
could not participate if they had a disability which would 
make it impossible to perform on the work sample that was 
used during testing. Participation was voluntary and sub- 
jects were paid $1.50 for each practice session they com- 
pleted. The data from four subjects were not used because 
these subjects did not complete five practice sessions. Four 
additional subjects were chosen to participate in order to 
replace the discarded data. Informed consent was obtained 
from all subjects and they were treated in accordance with 
the policies and procedures established by the University of 
Wisconsin-Stout on the treatment of human subjects. 



Apparatus . 

The subjects in this study performed on the 
EyerHand-Foot (EHF) Coordination work sample (Banks, 1974). 
This is^a relatively simple standardized psychomotor task 
designed to .test the ability of an individual to perform 
tasks which require s coordinated eye, hand, and foot move- 
ments. Those ■ abilities are required in a wide variety of 
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work tasks (e.g., machine operation, piloting, etc,). This 
work sample requires the individual to attach a bolt to a 
block of wood in a prescribed manner. Assembly of an EHF 
unit is accomplished using a power drill that is activated by 
a foot switch. The results of a motion-time study conducted 
by Banks indicated that the "industrial standard" (mean time 
to complete the task by an "average" worker in industry) 
would be 10.80 minutes per 50 units. A second set of norms, 
which were developed at the Vocational Development Center 
using vocational evaluation clients, was also used. The lat- 
ter norms classified people as "above average" (< 14.66 min- 
utes), "average" (14.66 to 22.66 minutes), or "below average" 
( > 22.66 minutes) depending upon the amount of time taken to 
complete 50 items. The location of the items to be assembled 
(nuts, bolts, & wooden blocks) was reversed for right and 
l^t handed people. 

A microcomputer manufactured by Ohio Scientific Instru- 
ments (Model C4P) was used to collect data in this study. 
The device was linked, via parallel Input/Output ports, to a 
remote switch which indicated when a subject completed a 
trial. The computer was programmed to compute the elapsed 
time taken to complete each trial (assembly of one EHF unit). 
This information was stored in the computer memory for later 
data processing. The. remote switch was located in a wooden 
box which was placed on the floor next to the seated subject 
during each practice session. The final step in the assembly 
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process required the subject to drop the completed unit into 
the box, thus triggering the switch. 

Procedure . 

Subjects were initially given a standardized explanation 
and demonstration of the correct procedures to use in assem- 
bling the EHF units. They then completed five untimed prac- 
tice trials under the direction of 'the experimenter. During 
those trials, the experimenter pointed out any errors the 
subject^ made and answered any questions about the procedure. 
The subjects, who were tested individually, began testing 
immediately after the untimed practice trials. Each subject 
completed 50 trials per day for 5 consecutive work d*ys dur- 
ing free time while they were clients in evaluation at the 
VDC. Thus, the study consisted of a repeated-measures design 
in which all subjects were treated identically. The depend- 
ent measure was the response latency for each of the 50 
repetitions completed each day. 
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III. RESULTS 

Several data analyses were conducted in addressing the 
research questions. The "raw data consisted of response times 
for the 250 repetitions (50 trials x^5 days) for each of the 
20 clients. In the first analysis, the changes in perform- 
ance over the five sessions were examined. In the subsequent 
analysis of prediction accuracy, the amount of change in per- 
formance from Day 1 to Day 5 served as the criterion against 
*which the different prediction methods were compared. One 
set of prediction methods used the data from Days 1 - 4 to 
predict the Day-5 total score. A smaller set of methods used 
only Day-1 data to predict Day-5 total scores. 

Analysis of Performance Change 

The initial set of data analyses was conducted to 
determine how much performance improved with practice. 
Response times for the 50 trials on each day were summed to 
produce daily total response-time scores. Average daily 
total response-time scores across all subjects are given in 
Figure 1. The mean performance time for each * successive 
daily practice session became smaller, indicating an obvious 
improvement 'in performance. (Note that smaller response 
times reflect better performance on the task). 

A 2 (Sex) by 5 (Practice Sessions) analysis of variance 
(ANOVA) was computed, using the total daily response-time 
scores for each subject to determine whether performance sig- 




DAILY PRACTICE SESSIONS 



Figure 1. Mean total response time on the work sample for each 
daily practice session (50 trials/session). 
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nlficantly changed with practice and whether males and 
females differe'd at this task. The results of that analysis 
are presented in Table 1. There was a significant effect for 
practice only. Post-hoc analyses (Ne^tian-Keuls testsj indi- 
cated that performance significantly improved on each suc- 
ceeding day of practice; i.e., performance on Day 2 was sig- 
nificantly better than on Day 1, performance on Day 3 was 
better than on Day 2, and so 'on, The lack of significance 
for the demographic variable of "sex indicates that males and 
females performed comparably on -the^task ,as both groups 
improved signif iqantly .with practice. 

, A « 

The total daily response-time scores from Day 1 and Day 
5 were examine'd to determine the number of subjects who met 
the industrial standard during each of those .days, and to 
determine the classification of eajch individual's performance 
using client norms established at the VDC. Each subject's 
response times for Days 1 and 5 are listed in Table 2 where 
it is evident that the performance of every subject improved 
from Day 1 >to DSy 5. It can "also be seen that only one of - 
the twenty clients (£% of the group) met the industrial stan- 
dard (a score "of 10.80 minutes or less) .on Day 1 but that 11 
(55X) of the clients did so on Day 5; A chi-square (x 2 )'test 
for related samples (Siegel, 1956) indicated that this change 
represented a significant increase* in the number of individu- 
als who met the Industrial standard (x 2 (l) s '8.1, p <* .01). 

The clients also showed marked improvement with respect to 
.* * j 



TABLE 1 ■ 
Summary of Analysis of Variance 
of Total Times and Groufx Means 



— ■ »' , 
Source 


Group 
Means 


df 


MS 


F 


£ 


— Sex~(-S~) ~ ; " 


< 

~ — - — ~ 


1 


8.84 


.15 


>.10 


^ ^Male 


12.56 










Female 


13.16 ' 










Error b 


• 


18 


58.69 






rraCHLc 

Sessions (P) 




• 4 


79.90 


48.33 


<.001 


Day 1 


16.10 










Day 2 


13.22 










• Day 3 


12.42 










Day* 4 


11 .66 










D *y * . 

S x P 


10.93 


4 


1.75 


1.06 


>.10 


> 

Error w 




72 


1.65 







( 



17 



erJc 



27 



TABLE 2 

Daily Performance Times (in Minutes) 
for Each Client on Days 1 and 5 



•Client # Day 1 Day 5 Client # Day 1 Day 5 



1 


12.19° 


8.30 lCt 


11 


16.60 3 


12.82° 


c 


i o • oy 


id fiS. 0, 


1 ? 

1 c. 






3_ 


24.43° 


12.23° 


.. 13 


15,.24. 3 ._ 


. ^,62A° 


4 


22.72 C 


14.03° 


14 


13.64° 


7.14 ia 


5 


15.40 C 


9.64 ia 


15 


12.37° 


8.71 1° 


6 


13.20° 


1 0.1 9 lCt 


16 


15.50 6 


11.11° 


7 


16.71 3 


10.16 lCt 


17 


12.43° 


8.57 ia 


8 


15.03 3 


11.84° 


18 


10.42 ia 


~9.53 lCt 


9 


15.29 3 


11.42° 


19 


26.83 c 


15.12 3 


10 


12,. 28° 


8.19 lCt 


20 


12.85° 


10.67 ia 



Exceeded industrial norm (score < 10.80 minutes) for 50 trials. 
"Classified as above average by VDC norms. 
^Classified as average by VDC norms. 
Classified as below average by VDC .norms. 
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the VDC client norms. On Day 1, eight clients (40*) per- 
formed "above average", nine clients (45%) were "average"/ 
and three clients (15%) were "below average". By Day 5, how- 
ever, 18 clients (90%) performed above average and the 
remaining 2 (10%) performed at the average level. Thus, the 
amount of error made when classifying people as likely to be 
successful or unsuccessful following practice is considerable 
when the prediction is based on a static-performance measure 
-obta-i ne d-f rom-a-s-i ng-le-admin-fe t-ra t-1 on-of th e -wo rk—s amp 1 e. - 

Further analyses of the data were performed in order to 
determine the amount of improvement that occurred across 
practice sessions. For each client, the percentage of 
improvement between Days 1 and 5 was computed using the fol- 
lowing formula: % Improvement = 100 * (1 - (Day-1 
score/Day-5 score)). The results of those. analyses indicated 
that the mean rate of improvement for the clients. was_30. 68%, 
with a standard deviation of 10.42%. The smallest amount of 
improvement was 14% and the largest was 50%. Had the per- 
formance scores for Day 1 been used as the only estimates of 
the ability of the clients who participated in this study, 
their actual capacity would have been underestimated by as 
much as 50%. Note that even this is a conservative estimate 
of the error since these individuals probably would have con- 
tinued to improve with additional practice or training on the 
task beyond Day 5. 
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Comparison of Prediction Formulas 

The purpose of the following analyses was to determine 
how accurately the level of performance reached on Day 5 
could be - estimated using a variety o v f prediction methods. 
The accuracy of each method was assessed by examining 1) the 
r degree of correlation between predicted and obtained Day-5 
scores; 2) the number of classification errors (incorrectly 
assigning people as above or below industrial standard on Day 
5) made using each prediction method; and 3) the percentage 
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of error in prediction resulting from the use of each method. 
The data used in these analyses consisted of the total 
response time scores from Days 1 - 4 or the response times 
scores for the first 50 trials (Day 1). 

Accuracy of predictions based on Day 1-4 Data . The 
firsjf* measure of accuracy was a comparison of the degree to 
which-six-learning-curve-equations -and th~e~Day-T total scores 
(thejmeasure typically used by evaluators) could predict 
Day-5 total scores. The data used with each learning-curve 
equation consisted of the total scores from Days 1 - 4 for 

eachjsubject. "Each of the daily scores represented the total 

i 

response time for the 50 trials the subject completed that 
day. I For eacji learning curve, a least-squares fit to the 
data was calculated and the parameter values that resulted 
were used to estimate the level of performance the subject 

i 

obtairjed on Day 5. Those predicted scores were then corre- 
lated with t^e scores the subjects actually obtained on Day 
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5. The Day-1 total scores were aVso correlated with the 
obtained Day-5 scores. 

The results of the correlational analyses are presented 
in Column A of Table 3, where it can be seen that the corre- 
lation between the predicted and obtained scores for the six 
learning-curve equations were higher (rs ranging from .927 to 
.968) than the correlation between the Day-1 total scores and 
the Day-5 total scores (r = .804). A series of t tests for 
differences between correlations indicated that the correla- 
tions obtained using the learning curves were all signifi- 
cantly higher than the correlations between Day-1 *otal 
scores and Day-5 scores (all ts(17) > 3.17, p < .01). No 
other significant differences were obtained. Although the 
correlation between the total response-time scores for Day 1 
and Day 5 was reasonably high, it was found that the correla- 
tion between the Day-1 total scores and the total scores for 
each succeeding day became smaller. For instance, the corre- 
lation between the Day-1 total scores and the Day-2 total 
scores was .90, whereas the value dropped to .804 by Day 5. 

The second measure of accuracy was an analysis of the 
number of classification errors that were made using each 
prediction method. A classification error was made when the 
use of a particular prediction method indicated that a client 
would exceed the industrial standard for the work sample on 
Day 5 but did not (a false positive), and when the prediction 
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TABLE 3 



Summary of Statistics Comparing Accuracy of 
Different Learning Curves (Using Data from Days 1-4) 
and Day-1 Total -Performance Scores in Predicting Day-5 

Obtained- Scores 



Column: 


A 


B 


c 


Prediction Methods 


Product-Moment 
Correlations 
(Pred. vs. Obtained 
Day 5 Scores) 


Number of 

Classification 

Errors** 


Mean %(s.d.) 

Prediction 

Error 


Learning Curves: 








Y=K( z£r ) 


.968* 


1 


6.57 (5.74) 


Y=AB* 


.965-* 


2 


10.13 (6.58) 


Y=A + (B * log X) 


.954* 


1 


8.30 (5.89) 


Y =Y +B 


.947* 


0 


7.88 (6.02) 


Y=AX e 


.93* 


1 


7.45 (5.64) 


Y= A + BC* 


.927* 


1 


8.41 (7.29) 


Day-1 Total Scores 


.804 


10 


30.68 (10.42) 



♦Results of a t test for related correlations indicated that this value 
significantly differed from the value for the Day-1 Total Scores. 

**Consisted of false pos.itive and false negative predictions. 



Indicated that the individual would be below the industrial 
standard but the client actually exceeded it (a false nega- 
tive). For each method, the predicted Day-5 score of each 
subject was compared to the actual Day-5 status of the client 
(above or below standard) and the prediction was classified 
as a correct classification or a s classif ication error. The 
number of classification errors produced by the six learning 
curves and the Day-1 total scores are presented in Column B,. 
of Table 3. As can be seen there, the learning curves were 
highly efficient in predicting whether clients would be above 
or below the industrial standard on Day 5. The use of the 
Day-1 total scores was not as efficient, however, inasmuch as 
10 of the 20 clients were misclassif ied using this method. 
Tests for the significance of differences between proportions 
indicated that the Day-1 total scores produced significantly 
more classification errors than any of the learning curves 
(all zs > 2.76, p < .01). 

A third measure of accuracy analyzed the percentage of 
error in prediction that was obtained using each of the 
learning curves and the Day-1 total scores. This measure was 
obtained using the following formula: % of Error in Predic- 
tion = 100 *' (l-(Predicted Value/Obtained Value)). The mean 
percentages of error .in prediction (and the standard devia- 
tions) are presented in Column C of Table 3. It can be seen 
there that the mean percentage of error for the learning 
curves ranged from 6.57% to 10.13%, whereas, the mean error 



rate using the Day-1 total scores as predictors was 30 . 6855 . 
The percentage-of -error data summarized in Table 3 were ana- 
lyzed with an ANOVA and Newman-Keuls tests and the results 
indicated that the use of the Day-1 total scores produced 
significantly higher percentage-of -error scores, on the aver- 
age, than did the use of any of the learning curves (F(6,133) 
■ 30.34, p < .001); Thus, the results of the analysis of 
this measure, as well as the analyses of the correlational 
and classification measures discussed above, indicate that 
the learning curves produce more accurate predictions of 
Day-5 scores than does the use of the Day-1 total scores. As 
with the previous measures, however, no significant differ- 
ences in predictive accuracy were found among any of the 
learning-curve equations using Day 1 - 4 data. 

Accuracy of predictions using Day-l-only data . A second 
set of analyses was conducted which examined the accuracy of 
predictions made using simply the data from Trials 1-50 
("Day 1 only 11 ). Recall that the above learning-curve analy- 
ses used total response-time data from Days 1-4. The pri- 
mary reason for using Day-l-only data in the present analyses 
was to determine how accurately learning curves would be 
using the data more likely to be available in a traditional 
work-sample situation. It is important to have such informa- 
tion because it seems likely that learning curves wjll not 
gain widespread usage if the amount of data needed to produce 
accurate predictions is much more than is currently collected 
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with most work samples. Thus, these analyses were conducted 
largely to assess the practicality of using learning curves 
to evaluate work-sample performance. 

A second reason for analyzing the accuracy of learning 

1 

curves using Day-l-only data was to determine whether the 
predictions based upon the different learning curves might 
significantly differ in accuracy if the predictions were made 
over a longer prediction interval (i.e., predicting from Day 
1 to Day 5 rather than from Days 1 - 4 to Day 5).. As was the 
case with the analyses of the data from Days 1-4, the pre- 
dictions based upon the Day-l-only data were examined in 
terms of the correlation between predicted and obtained Day-5 
scores, the number of classification ' errors, < and the 
percentage-of-error-1n-prediction measures. Not -al 1 -of the 
learning curves used in the analyses discussed above were 
used in these analyses 1 . The accuracy of the Day-1 total 
scores was again used as a standard for comparison when eval- 
uating the accuracy of the predictions made using the 
Day-l-only data. 

In addition to examining the accuracy of learning curves 
using the Day-l-only data, the "best-20% method was also 



Unformstlon on the accuracy of two formulas (Y • A + BC* and Y = AX B ) was not 
included because the results of the Initial analyses Indicated that those formulas were 
less accurate than the Day-1 total scores. Two other formulas (Y - AB and Y 
K(x+c/x+c+r)) were not Included because the computer proqrams which evaluate those formu- 
las could not efficiently handle such a large amount of data. Those Programs use an Iter- 
ative process to estimate the best-fitting parameter values and this method proved to be 
very time consuming with 50 data values. Since practicality was one of the u "* 
to evaluate the different prediction methods, It was decided not to Include those two 
curves In the analyses of the data from Trials 1 - *0. 
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studied* This measure was included because of its ease of 
computation* The measure consisted of identifying the fast- 
est 20% of Trials 1-50 and computing the mean of those 
responses. That score was then used as the estimate of the 
mean of the 50 trials completed on Day 5. 

Table 4 presents a summary of the accuracy of the pre- 
dictions that were made based upon the Day-l-only data and 
the Day-1 total scores in predicting Day-5 total scores. As 
can be seen in the table, the correlation between predicted 
and obtained Day-5 scores ranged from .57 for the log-linear 
equation to ,83 for the "best-ZO^ 11 method. Comparisons of 
those r values, using t tests for related correlations, indi- 
cated no significant differences between the correlations 
obtained using the different methods (all ts < 1.13, p > 
.05). Significant differences were found, however, for both 
the number of classification errors and the mean percentage 
of error in prediction measures for the different prediction 
methods. Analyses (t tests) for differences between propor- 
tions indicated that the number of classification errors made 
using the Day-1 total scores was significantly higher than 
either the best-20# method or the use of the log-linear 
curve. Also, with respect to the percentage of error in pre- 
diction measure, an ANOVA and post-hoc Newman-Keuls tests 
indicated that the best-20JC method was more accurate than the 
two learning curves which were more accurate than using the 
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'TABLE 4 

Summary of Statistics Comparing Accuracy of 
Different Prediction Methods (Using Data From 
Trials 1 - 50) and Day-1 Total Scores in 
Predicting Day-5 Obtained Scores 



Col umn : 


A* 


B 


C 


Prediction' Method 


Product-Moment 
Correlations 


Number of 

Classification 

Errors* 


Mean % \o.u.) 

Prediction 

Error 


Best 202 Method 


.835 


3** 


10.93 (7.36)*** 


Learning Curves: 








Y = A/X + B 


.73 


7 


17.65 (14.51)*** 


Y = A + (B • log X) 


.57 


4** 


21.75 (17.04)*** 


Day-1 Total Scores 


.804 


10 


30.68 (10.42) 



♦Classification errors consisted of false positive and false negative 
predictions. 

**Significantly differed from the Day-1 Total Scores. 
***Significantly differed from Day-1 Total Scores 
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Day-1 total scores in predicting Day-5 stores (F(3,76) = 
8.22, p < .001). 

Overall analyses ,. One final ANOVA was computed, which 
compared ,the accuracy of the predictions made with learning 
curves based upon data from Days 1 - 4 and all of the 
Day-l-only methods (i.e., Day-1 total scores, best-20% 
method, and learning curves which used data from Trials 1 - 
50). Thus, this analysis compared the accuracy of all pre- 
,$t diction methods, regardless of the amount or type of data 
used to make the predictions. The percentage of error in 
prediction was the only measure used in this analysis. All 
learning curves were more accurate than ^simply using the tra- 
ditional Day-1 total scores in predicting Day-5 scores 
(F(9,190) = 14.01, p <.001). The results of Newman-Keuls 
tests also indicated that the learning curves which used data 
from Days 1 - 4 and the best-20% method were significantly 
more accurate than the two learning curves which used 
Day-l-only data. 



28 

38 



IV. DISCUSSION 

The results indicated that the handicapped- individuals 
who participated in this study improved dramatically on the 
work sample. * The clients improved an. average of almost one 
third in just five brief practice sessions. This increase in 

* $ 

performance following practice was reflected in the larger 
number of individuals who met the industrial standard on Day 
5 as opposed to Day 1. A similar shift toward higher per- 
formance ratings was also found with respect to the VDC 
client norms. These findings suggest that testing an- indi- 
vidual .on a work sample only once and using the average or 
total time for that session as an index of performance capac- 
ity for that task can seriously underestimate the level of 
performance that the individual is capable of achieving on 
t'ne task. 

It was found that males and females did not differ in 
performance on this task. These results are opposite to 
those of Dunn (1976), who found that males performed better 
than females on the Stout U-bolt Assembly, at least ini- 
tially. However, Noble (1978) concluded, after reviewing the 
literature on psychomotor performance, that though sex dif- 
ferences are found on many motor tasks, no differences are 
also found on a large number of other motor tasks. Thus, the 
fact that Dunn found a sex difference with the task used in 
his study, but that none was found in this study, is. not unu- 
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sual. It seems important, however, that vocational evalua- 
tors know whether a sex difference can be expected on any 
work sample that they use and the exact nature of that dif- 
ference. For instance^ it would be important to know that 
females do not perform as welt as males early in training 
(testing) on a given task but that they eventually M catch up" 
to males with practice. Such a situation could result with 
tasks on which males would normally have more prior experi- 
ence than females, 

1 * <> 

The results of the analyses of. the three performance 
measures were very similar. The . findings of the 
correlational analyses indicated that the predictions based 
upon the data froti Days 1-4 were more highly correlated 
with the obtained Day-5 scores than the predictions based 
upon any of the methods using data from Day 1 only (i.e, , 
Day-1 total scores, the best-20* method, etc.)- In terms of 
the percentage-of-error measure, the results indicated that 
the use of learning curves with data f ronrDays 4 - 4 and the 
bes^-20< method produced the most accurate estimates of Day-5 
performance level. It was also- found that the use of learn- 
ing curves which usad data from Days 1 - 4, the .best -20% 
method, and one of the learning curves based upon data from 
Trials 1-50 all produced .fewer classification errors than 
the use of the Day-1 total scores. Thus, it was consistently 
found over the three measures of accuracy that the worst pre- 
dictor of Day-5 performance was- the "measure typically used by 
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evaluators — the total time score obtained .on a one-time 
administration of the work sample. . The predictions made 
using the data from Days 1-4 were more accurate than those 
made using data from Day 1 only, with the notable exception 
of the predictions made- with the best-20% method. This 
bes£r.2QX method, which used only the data from Trials 1-50 
on the first day, was as accurate on two of the performance 

measures as the methods which used data .from four days. 

j 

Perhaps the most interesting finding with respect to the 
percentage of error x and the classification error measures.$as 
that the best-20X method produced predictions which were as 

accurate as any of the learrving-curve predictions based upon 

I 

the data from Days 1-4. This finding is important because 
■ the best-20* method uses data. that could be obtained in many 
vocational -evaluation processes. It would be muCh easier and 
less time consuming if vocational evaluators needed 1 to col- 
lect data from only a single practice session io obtain an 
accurate estimate of an individual's future 7 performance 
capacity, 

i 

> It is encouraging that the best-20# method produced such 
accurate predictions but there are stil/1 a number of ques- 
tions remaining about the technique. For instance, it is 
unclear what percentage of trials wbuld.be optimal for- use in 
making predictions. The 2055 value u'sed'in the analyses of 
the re*suits of the present study was arbitrarily chosen. 
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Perhaps taking the mean of the best 10* or 15% would have 
resulted in more accurate predictions. * Also, it is 
hypothesized that a smaller number or percentage of trials 
would "be needed when making longer range predictions (e.g., 
predicting performance .on Day 25) than when making short 
range predictions, but that assumption has not been tested. 

o 

One issue that was only partially resolved in the pres- 
ent study is the question of which learning-curve formula is 
most accurate. When the data from Day 1 only were used, two 
of the curves (the log-linear and the 2 parameter hyperbolic) 
were found to be more accurate on two of the measures than 
the other curves. When the data from Days 1-4 were used, 
the six curves that were examined were found to produce 
equally accurate predictions. These findings do not provide 
persuasive evidence that any particular learning curve should 
be used- as opposed to the, other*s. In fact, if one were to 
make a recommendation about which prediction method to use, 
based upon the present findings, the best-20% method would 
probably be the most reasonable, choice. This method was 
found to be as accurate as any of the learning curves yet is 
easier to compute and requires data from only one practice 
session. This conclusion should be tempered, however, by the 
possiblity that future research might demonstrate deficien- 
cies in the accuracy of the best-20% method. Research should 
examine performance over a larger number of trials and with a 
number of different tasks to further test the accuracy of 
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thi;s method. Despite this caution, the best-20% method can 
certainly reduce the amount of prediction error when compared 
to the use of the traditional static performance measure of 
work-sample performance. 

The decision as to whether learning curves or some 
static measures are used when evaluating work-sample perform- 
ance should probably depend uppn the reason that the work 
sample is administered. If the interest of the evaluator and 
the client is in predicting whether the client is capable of 
becoming successfully employed at a particular job, then 
learning curves, or at least some measure reflecting perform- 
ance change with practice, should be used. If, on the other 
hand, the purpose of administering the work sample is to doc- 
ument whether the client can or can not perform at a given 
level at this point in time, then the use of a static measure 
might be appropriate. In most instances it would be appro- 
priate to use both the static and dynamic measures to analyze 
';he work-sample performance of an individual. This would 
enable the evaluator to determine how well the client is cur- 
rently doing relative to other people (the norm group) and/or 
relative to some performance criterion (e.g., the industrial 
standard), and to estimate how well the individual could 
potentially do in the future. 

A drawback to the use of either learning curves or some 
!% 

other dynamic performance measure is that such measures 
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require more work on the part of the evaluator. This is 
because the use of dynamic performance measures requires the 
evaluator to either record some performance measure on every 
response or to collect data from more than one practice ses- 
sion. It is the increased work load involved in their use 
which has probably prevented dynamic performance measures 
from becoming a widely adopted practice among vocational 
evaluators. Apparently, the potential increase in accuracy 
that the use of such measures could lead to is not offset by 
the increase in time and effort that their use entails. The 
use of microcomputers could prove to be invaluable in this 
context. The advantage of using a microcomputer to monitor 
the client's performance and then analyze the data is that 
evaluators are not required to do any more work than they 
currently do. Thus, a microcomputer makes the use of dynamic 
performance measures more practical and should lead to an 
increase in the use of such measures. 

As was mentioned previously, the Research^ and Training 
Center is currently examining the utility of employing micro- 
computers to collect and analyze data on, client work-sample 
performance. To date, the Center's efforts have primarily 
focused on the development of computer programs and 
interfacing equipment. The present study was an initial 
effort at evaluating the utility of the learning-curve 
approach using microcomputers. Future efforts will involve a 
demonstration/evaluation of the system in a number of 
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•rehabilitation facilities. The interest of those efforts 
will be in determining both the accuracy and reliability of 
this approach, as well as 'its practicality. 

Conclusions 

" ne results of this study raise questions about the 
appropriateness of using static measures of work-sample per- 
formance when the purpose of the assessment is to estimate 
someone's capacity to become successfully employed at the 
task represented by the work sample. This conclusion seems 
warranted by the finding that the handicapped subjects in 
this study increased dramatically in performance on the work 
sample with only five relatively brief practice sessions. 
This finding clearly suggests that the use of a static per- 
formance measure would seriously underestimate the perform- 
ance level that an individual could attain on many tasks if 
given ample practice. 

This study also examined the utility of using a number 
of different prediction techniques for the purpose of esti- 
mating someone's performance capacity on a work sample. It 
was found that the traditional static work-sample measures 
provided consistently worse estimates of the final perform- 
ance level than did any of the other techniques used in this 
study. This finding clearly supports the need to use learn- 
ing curves or other indices reflecting learning for predic- 
tion purposes rather than the traditional static measures 
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such as *tfcejnean or total score. In this study, the best-20* 
method proved to be as accurate as the six learning curve 
formulas that were examined and it was suggested that this 
might be the best method to employ when evaluating 
work-sample performance. The most obvious advantage of this 
method is its practicality. 

Further research is needed to gain additional informa- 
tion about the utility of using learning curves or the 
best-20% method to make predictions. For instance, it is 
still not known which method provides the most accurate esti- 
mates of performance over long prediction intervals. 
Research should also be conducted to develop software pro- 
- grams for use on microcomputers. This rapidly advancing 
technology could lead to an increase in the use of dynamic 
performance measures for assessing the work-sample perform- 
ance of handicapped individuals. Hopefully, the vocational 
predictions about clients will become increasingly accurate 
as dynamic measures become a regularly used tool of voca- 
tional evaluators. 
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